-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate external transform wrappers using a script #29834
Conversation
…IO direct GHA workflow
Assigning reviewers. If you would like to opt out of this review, comment R: @jrmccluskey for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
R: @robertwb @tvalentyn @chamikaramj (manually requesting to make the review bot happy) |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
.github/workflows/beam_PreCommit_Xlang_Generated_Transforms.yml
Outdated
Show resolved
Hide resolved
// setup test env | ||
def serviceArgs = project.project(':sdks:python').mapToArgString(expansionServiceOpts) | ||
executable 'sh' | ||
args '-c', ". ${project.ext.envdir}/bin/activate && $pythonDir/scripts/run_expansion_services.sh stop --group_id ${project.name} && $pythonDir/scripts/run_expansion_services.sh start $serviceArgs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't do this now, could you at least file a bug and drop a TODO.
Ah, yes, that's right. Hopefully the file generation itself isn't *that*
slow.
…On Mon, Feb 12, 2024 at 3:02 PM Ahmed Abualsaud ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sdks/python/setup.py
<#29834 (comment)>:
> + # if exists, this directory will have at least its __init__.py file
+ if (not os.path.exists(generated_transforms_dir) or
+ len(os.listdir(generated_transforms_dir)) <= 1):
+ message = 'External transform wrappers have not been generated '
+ if not script_exists:
+ message += 'and the generation script `gen_xlang_wrappers.py`'
+ if not config_exists:
+ message += 'and the standard external transforms config'
+ message += ' could not be found'
+ warnings.warn(message)
+ else:
+ warnings.warn(
+ 'Skipping external transform wrapper generation as they '
+ 'are already generated.')
+ return
+ out = subprocess.run([
This will make pretty much every run of setup.py slow, right?
Not anymore, since we decided to decouple the generation steps. setup.py
expects the transform config to already exist, so no expansion + discovery
is done here. All it does is generate files based on the existing config.
—
Reply to this email directly, view it on GitHub
<#29834 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADWVAJPRS6PTJHJGTPTHE3YTKNPZAVCNFSM6AAAAABA4ZG2QSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNZWGUYTSNZSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That will work, alternatively, you can have dedicated functions: to generate config, and to generate wrappers, and make necessary imports only in functions where imports are required; you might have to add |
@tvalentyn this is already the case, the steps are in different functions. e.g. we're only importing |
…lly starting each one beforehand; skip existing transforms from gcp service config; cleanup precommit test config
…suites, where expansion service wouldn't be built
…path, use simple function to return beam jar exp service
The newly added precommit that performs a transform config sync check (beam_PreCommit_Xlang_Generated_Transforms) is running green on my fork: https://github.com/ahmedabu98/beam/actions/runs/7933231703/ Will merge after current tests pass |
…wrappers_script
Implementing a script that generates wrappers for external SchemaTransforms, according to Option #3 in the following design doc: https://s.apache.org/autogen-wrappers
The script's workflow takes place in setup.py, which can be invoked for local setup or for building the SDK for a Beam release. Files are generated in a subdirectory that is ignored by git. From there, the wrappers can be imported into relevant
__init__.py
files.Wrappers are generated along with any documentation provided from the underlying SchemaTransform, and is in compliance with existing linting and pydoc rules.
With the expansion service YAML config, the following transform YAML config is generated:
Transform YAML config
From this config, external transform wrappers are created and stored in appropriate modules. For example, our config gives us the following
apache_beam/transforms/xlang/io.py
file:GenerateSequence wrapper
Including documentation for how the script is used, and unit tests for different parts of the script.
Adding a gradle command
./gradlew generateExternalTransformsConfig
to build the configs (it takes care of building the relevant expansion jars beforehand). Once thAlso adding a PreCommit test that generates the transform config from scratch and compares it with the existing one. This serves to indicate whether a change will render the existing config out of sync. To resolve, one would re-generate the config (with
./gradlew generateExternalTransformsConfig
) and commit the changes.